Exploring the landscape of symbiotic diversity and distribution in unicellular ciliated protists

Background The eukaryotic-bacterial symbiotic system plays an important role in various physiological, developmental, and evolutionary processes. However, our current understanding is largely limited to multicellular eukaryotes without adequate consideration of diverse unicellular protists, including ciliates. Results To investigate the bacterial profiles associated with unicellular organisms, we collected 246 ciliate samples spanning the entire Ciliophora phylum and conducted single-cell based metagenome sequencing. This effort has yielded the most extensive collection of bacteria linked to unicellular protists to date. From this dataset, we identified 883 bacterial species capable of cohabiting with ciliates, unveiling the genomes of 116 novel bacterial cohabitants along with 7 novel archaeal cohabitants. Highlighting the intimate relationship between ciliates and their cohabitants, our study unveiled that over 90% of ciliates coexist with bacteria, with individual hosts fostering symbiotic relationships with multiple bacteria concurrently, resulting in the observation of seven distinct symbiotic patterns among bacteria. Our exploration of symbiotic mechanisms revealed the impact of host digestion on the intracellular diversity of cohabitants. Additionally, we identified the presence of eukaryotic-like proteins in bacteria as a potential contributing factor to their resistance against host digestion, thereby expanding their potential host range. Conclusions As the first large-scale analysis of prokaryotic associations with ciliate protists, this study provides a valuable resource for future research on eukaryotic-bacterial symbioses. Video Abstract Supplementary Information The online version contains supplementary material available at 10.1186/s40168-024-01809-w.

The median of unclassified contigs assembled from ciliate samples were 17.18%, 30.77% and 39.26% at the phylum, class and order levels, respectively, compared to 6.66%, 13.78% and 17.60% for the human stool samples.Box-plot elements are defined as: center line, median; box limits, upper and lower quartiles; whiskers, 1.5× interquartile range; points, outliers.P values were measured by twotailed Wilcoxon test.n, number of specimens.The proportion was calculated by dividing the total length of unclassified contigs by the total length of all contigs, and contigs smaller than 1 kb were filtered out.

Figure S2. New symbionts in ciliates. (A)
An approximately maximum-likelihood phylogenetic tree of Syntrophorhabdia was constructed.(B) An approximately maximum-likelihood phylogenetic tree of Syntrophorhabdia was constructed.From left to right: Maximum-likelihood phylogenetic tree, genome size, GC content, completeness, contamination rate, and rSGB ID.The Escherichia coli (GCF_000005845) was used as outgroup.The scale bar represents the mean number of nucleotide substitutions per site.Bootstrap values are indicated on the tree.(C) Average nucleotide identity assessment was performed to compare three symbionts (rSGB118, rSGB154, and rSGB56) with the representative species Syntrophorhabdus aromaticivorans.(D) The host species of the three symbionts.(E) The approximately maximum-likelihood phylogenetic tree of the order Legionellales.(F) Genomic characteristics of arepresentative endosymbiont, namely rSGB136.From the outer ring to the inner ring arethe genes of positive strand.GC content.GC skew and negative strand genes.respectively.High abundance of CAB is significantly enriched in mammalian pathogens.The significance was determined using the hypergeometric test.The blue bar represents CAB, while the orange bar represents mammal pathogens.Genera with a maximum abundance greater than 10% in the specimen were defined as high-abundance genera.

Figure S1 .
Figure S1.Metagenome assembly and identification of ciliate-associated bacteria.(A) The length distribution of bacterial contigs.A total of 6,042,995 ciliate-associated bacterial contigs were analyzed.The mean contig length was 1197.44 bp.(B) Distribution of bacterial content among specimens.The bacterial-derived reads accounted for 4.44% of the total sequencing data.The density was generated automatically using the "geom_density" function in ggplot.A transcript per million (TPM) like calculation to normalize the length of the contig and the sequencing depth of the sample when assessing the coverage of the contig.(C-E) Accuracy evaluation of contig classification.(C) The distribution of TPM and GC content between bacterial and host contigs.(D) Comparison of GC content (left) and TPM (right) between bacterial contigs and randomly selected control contigs for all specimens.The dashed line indicates the P value of 0.05.(E) The accuracy of contig taxonomic classification evaluated by another software.Numbers in brackets are 95% confidence intervals for the means.(F) Comparison of unclassified contigs in ciliate samples and human stool samples.The median of unclassified contigs assembled from ciliate samples were 17.18%, 30.77% and 39.26% at the phylum, class and order levels, respectively, compared to 6.66%, 13.78% and 17.60% for the human stool samples.Box-plot elements are defined as: center line, median; box limits, upper and lower quartiles; whiskers, 1.5× interquartile range; points, outliers.P values were measured by twotailed Wilcoxon test.n, number of specimens.The proportion was calculated by dividing the total length of unclassified contigs by the total length of all contigs, and contigs smaller than 1 kb were filtered out.

Figure S3 .
Figure S3.Characteristics of ciliate-associated bacteria.(A-B) The taxonomy of representative single genomic bins.(A) The count of novel rSGB in each phylum or clade.(B) The percentage of novel bacterial rSGBs.(C-H) Relationship between eukaryotic like proteins (ELPs) and symbionts.(C) ELPs were identified in 73.5% of CABs.(D) The distribution of the ELP counts in CAB species.The mean and median of the ELPs were 11.3 and 6, respectively.(E) KEGG functional classification of the ELPs.The color of bars represents the proportion of total ELPs.(F) Spearman correlation between the number of ELPs in CABs and their abundance in the host.(G-H) Spearman correlation between the proportion of ELPs in CAB and their prevalence (G) and abundance (H) in the host.(I) Detection of CABs in ciliates at species and genus level.(J) Distribution of 883 CAB species among 246 specimens.Each short bar denotes one specimen.

Figure S5 .
Figure S5.(A) Enrichment analysis of different CAB patterns at the host's class level.P-values were calculated using Fisher's test.(B-G) Variation in the bacterial community at the individual level of ciliates.The rows of the heatmap represent the two biological replicates of the host, the columns represent different symbiotic bacteria, and the colors indicate the abundance of the symbiotic bacteria.

Figure S6 .
Figure S6.Factors influencing the microbiota structure and functional analysis.(A) t-SNE analysis was conducted to examine the influencing factors of the ciliate symbiotic community.The R 2 value and P-value were calculated using the Permanova test.Each dot on the plot represents a distinct ciliate species, with the same color and shape indicating samples from the same sampling site.(B-C) Spearman correlation analysis was performed to assess the relationship between different biological replicates.The left panel (B) represents the analysis for Certesia quadrinucleata, while the right panel (C) represents the analysis for Euplotes woodruffi.(D) Pearson correlation analysis was employed to examine the correlation between gene content and the Shannon index of intracellular symbionts.The digestibility of food vesicles was defined by the inclusion of genes associated with endocytosis, phagosome, lysosome, and peroxisome.(E) The frequency of CAB genes from different functional classes was examined across 91 ciliate species.

Figure S7 .
Figure S7.The potential reservoirs for other symbiotic systems (A) The Venn diagram illustrates the overlap between CABs and bacteria associated with marine invertebrates.(B) The Venn diagram represents the overlap between CABs and mammal pathogens.Fisher's test was used for statistical analysis in panels E and F. (C)High abundance of CAB is significantly enriched in mammalian pathogens.The significance was determined using the hypergeometric test.The blue bar represents CAB, while the orange bar represents mammal pathogens.Genera with a maximum abundance greater than 10% in the specimen were defined as high-abundance genera.

Table S2 . Taxonomic and integrity information of 148 representative single genomic bins. samleName Count indicates
the number of samples assembled for the bin; RepresentBin ANI represents the maximum ANI value of the genome as compared to the refence; RepresentBin refID indicates the ID of the largest ANI, and RepresentBin refSpecies indicates the name of the corresponding species.